{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9941f037-a7fa-4047-a1b9-ca0fce554549",
   "metadata": {},
   "source": [
    "> lingo：行话，术语；\n",
    "\n",
    "- github:\n",
    "    - https://github.com/Huanshere/VideoLingo\n",
    "- 官方文档：\n",
    "    - https://videolingo.io/docs/introduction\n",
    "- https://colab.research.google.com/github/Huanshere/VideoLingo/blob/main/VideoLingo_colab.ipynb\n",
    "- requirements\n",
    "    - WhisperX（本地 whisper）"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4908ee50-6afc-4640-8de7-672144c87421",
   "metadata": {},
   "source": [
    "- install\n",
    "    ```\n",
    "    git clone https://github.com/Huanshere/VideoLingo.git\n",
    "    cd VideoLingo\n",
    "    \n",
    "    conda create -n video python=3.12\n",
    "    conda activate video\n",
    "    \n",
    "    # 注意这里的安装是交互式，可能需要一些手动输入的参与（或者获取 sudo 权限）\n",
    "    python install 1\n",
    "\n",
    "    # 启动web服务\n",
    "    streamlit run st.py\n",
    "    ```\n",
    "- docker\n",
    "```\n",
    "docker pull rqlove/videolingo:latest\n",
    "docker run -d -p 8501:8501 --gpus all rqlove/videolingo:latest\n",
    "```\n",
    "- steps\n",
    "    - Download or Upload Video\n",
    "        - 这一步执行完成的标志即是能页面播放视频时；\n",
    "    - Translate and Generate Subtitles\n",
    "    - Dubbing\n",
    "- 界面\n",
    "    - Vocal Separation Enhance\n",
    "        - Recommended for videos with loud background noise, but will increase processing time\n",
    "    - Burn-in subtiles：takes longer time\n",
    "- 其他\n",
    "    - 界面上的配置都是加载的代码根目录的 `config.yaml`\n",
    "    - 如果上传的是纯 audio，则会用 ffmpeg 在 output 文件夹首先生成 audio_with_black_screen.mp4 的视频文件；\n",
    "    - 划 segments：600s（10mins）为一个chunk\n",
    "    - 所有的过程文件（比如字幕文件）都在 output 文件夹下\n",
    "        - Include Video：output_video_with_subs.mp4;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0edd3ac9-fe21-4729-8c5a-f68186c6673a",
   "metadata": {},
   "source": [
    "### UVR => Demucs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63be466f-c467-4572-a182-e27eaa7441a7",
   "metadata": {},
   "source": [
    "Ultimate Vocal Remover (UVR) 工具通常可以用于以下任务：\n",
    "\n",
    "- 去除人声：从歌曲中移除人声，保留伴奏（俗称“卡拉OK”处理）。\n",
    "- 提取人声：从混音中分离并提取出人声，保留干净的人声音轨。\n",
    "\n",
    "当有较大背景音乐时，如果不进行人声分离（vocal separation），可能会导致单词级字幕识别不准确，从而在最后的对齐步骤出现错误。\n",
    "\n",
    "raw_full_audio.wav\n",
    "- original_vocal.wav\n",
    "- background.wav\n",
    "\n",
    "misc\n",
    "- de-echo：去回声；"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27721bb5-8dab-4c8c-a596-7c5f7d77fd42",
   "metadata": {},
   "source": [
    "- demucs\n",
    "    - raw.mp3 => background.mp3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69494277-228d-43ee-8658-633865bfd8b3",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    }
   },
   "source": [
    "### ASR"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24525cc3-6fc7-4a4d-baed-58d3a18dd08d",
   "metadata": {},
   "source": [
    "- asr\n",
    "    - chinese: BELLE-2/Belle-whisper-large-v3-zh-punct\n",
    "        - Belle-whisper-large-v3-zh + punctuation mark \n",
    "    - english: large-v3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "420aec3f-4fd0-4e35-be1d-20bc59de350f",
   "metadata": {},
   "source": [
    "### whisperx"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ec2d9d8-ab04-4b62-a81c-34745e9d7c1b",
   "metadata": {},
   "source": [
    "### transformers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63708a3a-5ad7-4149-8d69-4dc4b2a3c7f7",
   "metadata": {},
   "source": [
    "```\n",
    "import torch\n",
    "from transformers import pipeline\n",
    "\n",
    "transcriber = pipeline(\n",
    "  \"automatic-speech-recognition\", \n",
    "  model=\"BELLE-2/Belle-whisper-large-v3-zh-punct\",\n",
    "  device=\"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
    ")\n",
    "\n",
    "transcriber.model.config.forced_decoder_ids = (\n",
    "  transcriber.tokenizer.get_decoder_prompt_ids(\n",
    "    language=\"zh\", \n",
    "    task=\"transcribe\"\n",
    "  )\n",
    ")\n",
    "\n",
    "transcription = transcriber(\"my_audio.wav\")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a053f062-a239-4a4d-8fca-ec963bda1a26",
   "metadata": {},
   "source": [
    "### Dubbing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f006211f-0dfa-4ed8-ab23-77b76a37f94e",
   "metadata": {},
   "source": [
    "- 指将音频中的原始语音或声音替换为新的音轨。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff523590-f463-41e0-a079-a8d080c634c1",
   "metadata": {},
   "source": [
    "### LLM"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7ab0300-99f8-4313-b807-887e6d966bb5",
   "metadata": {},
   "source": [
    "- base url: https://api.openai.com/\n",
    "- model: gpt-40-min"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a13ade55-e457-453e-a001-c8bbde474148",
   "metadata": {},
   "source": [
    "#### sentence split by meaning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "775f95bd-e434-472b-8bef-8cd54bc3cb26",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:02:23.263815Z",
     "iopub.status.busy": "2024-12-08T14:02:23.261613Z",
     "iopub.status.idle": "2024-12-08T14:02:23.275495Z",
     "shell.execute_reply": "2024-12-08T14:02:23.273022Z",
     "shell.execute_reply.started": "2024-12-08T14:02:23.263751Z"
    }
   },
   "outputs": [],
   "source": [
    "s = \"### Role\\nYou are a professional and experienced Netflix subtitle splitter in zh.\\n\\n### Task\\nYour task is to split the given subtitle text into **2** parts, each should be less than 20 words.\\n\\n### Requirements\\n1. Try to maintain the coherence of the sentence meaning, split according to Netflix subtitle standards, ensuring the two parts are relatively independent.\\n2. The length of each part should be roughly equal, no part should be less than 3 words, but the integrity of the sentence is more important.\\n3. Prioritize splitting at punctuation marks, such as periods, commas, and conjunctions (e.g., \\\"and\\\", \\\"but\\\", \\\"because\\\", \\\"when\\\", \\\"then\\\", \\\"if\\\", \\\"so\\\", \\\"that\\\").\\n\\n### Steps\\n1. Analyze the grammar and structure of the given text.\\n2. Provide 2 different ways to split the text, each with different split points, output complete sentences (do not change any letters or punctuation), insert [br] tags at the split positions.\\n3. Briefly compare and evaluate the above 2 split methods, considering readability, grammatical structure, and contextual coherence, choose the best split method.\\n4. Give the best split method number, 1 or 2.\\n\\n### Output Format\\nPlease provide your answer in the following JSON format, <<>> represents placeholders:\\n{\\n    \\\"analysis\\\": \\\"Brief analysis of the text structure and split strategy\\\",\\n    \\\"split_1\\\": \\\"<<The first split method, output complete sentences, insert [br] as a delimiter at the split position. e.g. this is the first part [br] this is the second part.>>\\\",\\n    \\\"split_2\\\": \\\"<<The second split method>>\\\",\\n    \\\"eval\\\": \\\"<<Unified brief evaluation of the 2 split methods, written in one sentence, no line breaks>>\\\",\\n    \\\"best\\\": \\\"<<The best split method number, 1 or 2>>\\\"\\n}\\n\\n### Given Text\\n<split_this_sentence>\\n个概念都会用到就是Mt-S就是Megatransferspersecond就是每秒传输的大M的这个表示的克瓦特传输的次数我们可以用Mt来衡量再除以1024就是用Gt来衡量\\n</split_this_sentence>\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "aef9462b-63b8-4786-8f4c-3456b668f4a2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:02:24.819807Z",
     "iopub.status.busy": "2024-12-08T14:02:24.818669Z",
     "iopub.status.idle": "2024-12-08T14:02:24.827250Z",
     "shell.execute_reply": "2024-12-08T14:02:24.825559Z",
     "shell.execute_reply.started": "2024-12-08T14:02:24.819775Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "### Role\n",
      "You are a professional and experienced Netflix subtitle splitter in zh.\n",
      "\n",
      "### Task\n",
      "Your task is to split the given subtitle text into **2** parts, each should be less than 20 words.\n",
      "\n",
      "### Requirements\n",
      "1. Try to maintain the coherence of the sentence meaning, split according to Netflix subtitle standards, ensuring the two parts are relatively independent.\n",
      "2. The length of each part should be roughly equal, no part should be less than 3 words, but the integrity of the sentence is more important.\n",
      "3. Prioritize splitting at punctuation marks, such as periods, commas, and conjunctions (e.g., \"and\", \"but\", \"because\", \"when\", \"then\", \"if\", \"so\", \"that\").\n",
      "\n",
      "### Steps\n",
      "1. Analyze the grammar and structure of the given text.\n",
      "2. Provide 2 different ways to split the text, each with different split points, output complete sentences (do not change any letters or punctuation), insert [br] tags at the split positions.\n",
      "3. Briefly compare and evaluate the above 2 split methods, considering readability, grammatical structure, and contextual coherence, choose the best split method.\n",
      "4. Give the best split method number, 1 or 2.\n",
      "\n",
      "### Output Format\n",
      "Please provide your answer in the following JSON format, <<>> represents placeholders:\n",
      "{\n",
      "    \"analysis\": \"Brief analysis of the text structure and split strategy\",\n",
      "    \"split_1\": \"<<The first split method, output complete sentences, insert [br] as a delimiter at the split position. e.g. this is the first part [br] this is the second part.>>\",\n",
      "    \"split_2\": \"<<The second split method>>\",\n",
      "    \"eval\": \"<<Unified brief evaluation of the 2 split methods, written in one sentence, no line breaks>>\",\n",
      "    \"best\": \"<<The best split method number, 1 or 2>>\"\n",
      "}\n",
      "\n",
      "### Given Text\n",
      "<split_this_sentence>\n",
      "个概念都会用到就是Mt-S就是Megatransferspersecond就是每秒传输的大M的这个表示的克瓦特传输的次数我们可以用Mt来衡量再除以1024就是用Gt来衡量\n",
      "</split_this_sentence>\n"
     ]
    }
   ],
   "source": [
    "print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dba53c0b-4bee-49f7-ae7e-7cb129030f1c",
   "metadata": {},
   "source": [
    "#### subtile trim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "272b5ac3-ebab-4a8e-8fcd-046bf9092a6a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:05:13.301403Z",
     "iopub.status.busy": "2024-12-08T14:05:13.300772Z",
     "iopub.status.idle": "2024-12-08T14:05:13.312784Z",
     "shell.execute_reply": "2024-12-08T14:05:13.310627Z",
     "shell.execute_reply.started": "2024-12-08T14:05:13.301359Z"
    }
   },
   "outputs": [],
   "source": [
    "s = \"\\n### Role\\nYou are a professional subtitle editor, editing and optimizing lengthy subtitles that exceed voiceover time before handing them to voice actors. Your expertise lies in cleverly shortening subtitles slightly while ensuring the original meaning and structure remain unchanged.\\n\\n### Subtitle Data\\n<subtitles>\\nSubtitle: \\\"希望那一天能尽快到来，以上就是本期的全部内容。\\\"\\nDuration: 3.2719999999999914 seconds\\n</subtitles>\\n\\n### Processing Rules\\nConsider a. Reducing filler words without modifying meaningful content. b. Omitting unnecessary modifiers or pronouns, for example:\\n    - \\\"Please explain your thought process\\\" can be shortened to \\\"Please explain thought process\\\"\\n    - \\\"We need to carefully analyze this complex problem\\\" can be shortened to \\\"We need to analyze this problem\\\"\\n    - \\\"Let's discuss the various different perspectives on this topic\\\" can be shortened to \\\"Let's discuss different perspectives on this topic\\\"\\n    - \\\"Can you describe in detail your experience from yesterday\\\" can be shortened to \\\"Can you describe yesterday's experience\\\" \\n\\n### Processing Steps\\nPlease follow these steps and provide the results in the JSON output:\\n1. Analysis: Briefly analyze the subtitle's structure, key information, and filler words that can be omitted.\\n2. Trimming: Based on the rules and analysis, optimize the subtitle by making it more concise according to the processing rules.\\n\\n### Output Format\\nPlease complete the following JSON data, where << >> represents content you need to fill in:\\n{\\n    \\\"analysis\\\": \\\"<<Brief analysis of the subtitle, including structure, key information, and potential processing locations>>\\\",\\n    \\\"trans_text_processed\\\": \\\"<<Optimized and shortened subtitle in the original subtitle language>>\\\"\\n}\\n\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "bf790c44-25af-4303-93cb-1e071175934f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:05:16.865703Z",
     "iopub.status.busy": "2024-12-08T14:05:16.865075Z",
     "iopub.status.idle": "2024-12-08T14:05:16.875014Z",
     "shell.execute_reply": "2024-12-08T14:05:16.872595Z",
     "shell.execute_reply.started": "2024-12-08T14:05:16.865658Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "### Role\n",
      "You are a professional subtitle editor, editing and optimizing lengthy subtitles that exceed voiceover time before handing them to voice actors. Your expertise lies in cleverly shortening subtitles slightly while ensuring the original meaning and structure remain unchanged.\n",
      "\n",
      "### Subtitle Data\n",
      "<subtitles>\n",
      "Subtitle: \"希望那一天能尽快到来，以上就是本期的全部内容。\"\n",
      "Duration: 3.2719999999999914 seconds\n",
      "</subtitles>\n",
      "\n",
      "### Processing Rules\n",
      "Consider a. Reducing filler words without modifying meaningful content. b. Omitting unnecessary modifiers or pronouns, for example:\n",
      "    - \"Please explain your thought process\" can be shortened to \"Please explain thought process\"\n",
      "    - \"We need to carefully analyze this complex problem\" can be shortened to \"We need to analyze this problem\"\n",
      "    - \"Let's discuss the various different perspectives on this topic\" can be shortened to \"Let's discuss different perspectives on this topic\"\n",
      "    - \"Can you describe in detail your experience from yesterday\" can be shortened to \"Can you describe yesterday's experience\" \n",
      "\n",
      "### Processing Steps\n",
      "Please follow these steps and provide the results in the JSON output:\n",
      "1. Analysis: Briefly analyze the subtitle's structure, key information, and filler words that can be omitted.\n",
      "2. Trimming: Based on the rules and analysis, optimize the subtitle by making it more concise according to the processing rules.\n",
      "\n",
      "### Output Format\n",
      "Please complete the following JSON data, where << >> represents content you need to fill in:\n",
      "{\n",
      "    \"analysis\": \"<<Brief analysis of the subtitle, including structure, key information, and potential processing locations>>\",\n",
      "    \"trans_text_processed\": \"<<Optimized and shortened subtitle in the original subtitle language>>\"\n",
      "}\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4815118-8e3c-4992-a108-955607f5ac86",
   "metadata": {},
   "source": [
    "#### summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "948762dc-5129-467f-a2df-0d2223a8896e",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:06:08.777985Z",
     "iopub.status.busy": "2024-12-08T14:06:08.777314Z",
     "iopub.status.idle": "2024-12-08T14:06:08.791548Z",
     "shell.execute_reply": "2024-12-08T14:06:08.789282Z",
     "shell.execute_reply.started": "2024-12-08T14:06:08.777937Z"
    }
   },
   "outputs": [],
   "source": [
    "s = \"### Role\\nYou are a professional video translation expert and terminology consultant. Your expertise lies not only in accurately understanding the original zh text but also in extracting key professional terms and optimizing the translation to better suit the expression habits and cultural background of Chinese.\\n\\n### Task Description \\nFor the provided original zh video text, you need to:\\n1. Summarize the video's main topic in one sentence\\n2. Extract professional terms and names that appear in the video, and provide Chinese translations or suggest keeping the original language terms. Avoid extracting simple, common words.\\n3. For each translated term or name, provide a brief explanation\\n\\n### Analysis and Summary Steps\\nPlease think in two steps, processing the text line by line:  \\n1. Topic summarization:\\n   - Quickly skim through the entire text to understand the general idea\\n   - Summarize the topic in one concise sentence\\n2. Term and name extraction:\\n   - Carefully read the entire text, marking professional terms and names\\n   - For each term or name, provide a Chinese translation or suggest keeping the original, only the word itself is needed, not the pronunciation\\n   - Add a brief explanation for each term or name to help the translator understand\\n   - If the word is a fixed abbreviation or a proper name, please keep the original.\\n\\n### Output Format\\nPlease output your analysis results in the following JSON format, where <> represents placeholders:\\n{\\n    \\\"theme\\\": \\\"<Briefly summarize the theme of this video in 1 sentence>\\\",\\n    \\\"terms\\\": [\\n        {\\n            \\\"original\\\": \\\"<Term or name 1 in the zh>\\\",\\n            \\\"translation\\\": \\\"<Chinese translation or keep original>\\\",\\n            \\\"explanation\\\": \\\"<Brief explanation of the term or name>\\\"\\n        },\\n        {\\n            \\\"original\\\": \\\"<Term or name 2 in the zh>\\\",\\n            \\\"translation\\\": \\\"<Chinese translation or keep original>\\\",\\n            \\\"explanation\\\": \\\"<Brief explanation of the term or name>\\\"\\n        },\\n        ...\\n    ]\\n}\\n\\n### Single Output Example (Using French as an example)\\n\\n{\\n    \\\"theme\\\": \\\"Ce vidéo résume le musée du Louvre à Paris.\\\",\\n    \\\"terms\\\": [\\n        {\\n            \\\"original\\\": \\\"Mona Lisa\\\",\\n            \\\"translation\\\": \\\"La Joconde\\\",\\n            \\\"explanation\\\": \\\"Le tableau le plus célèbre du Louvre, un portrait de Léonard de Vinci\\\"\\n        },\\n        {\\n            \\\"original\\\": \\\"pyramid\\\",\\n            \\\"translation\\\": \\\"la pyramide\\\",\\n            \\\"explanation\\\": \\\"Une grande structure en verre et métal en forme de pyramide située à l'entrée principale du Louvre\\\"\\n        },\\n        {\\n            \\\"original\\\": \\\"I.M. Pei\\\",\\n            \\\"translation\\\": \\\"I.M. Pei\\\",\\n            \\\"explanation\\\": \\\"L'architecte américain d'origine chinoise qui a conçu la pyramide du Louvre\\\"\\n        },\\n        ...\\n    ]\\n}\\n\\n### Video text data to be processed\\n<video_text_to_summarize>\\n各 位 朋 友 们 大 家 周 末 好 今 天 我 们 继 续 回 到 小 白 象 深 度 学 习 装 机 指 南...\\n</video_text_to_summarize>\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "7318d02c-4777-4f31-95c8-4518545123ae",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:06:10.092963Z",
     "iopub.status.busy": "2024-12-08T14:06:10.092356Z",
     "iopub.status.idle": "2024-12-08T14:06:10.101905Z",
     "shell.execute_reply": "2024-12-08T14:06:10.099888Z",
     "shell.execute_reply.started": "2024-12-08T14:06:10.092918Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "### Role\n",
      "You are a professional video translation expert and terminology consultant. Your expertise lies not only in accurately understanding the original zh text but also in extracting key professional terms and optimizing the translation to better suit the expression habits and cultural background of Chinese.\n",
      "\n",
      "### Task Description \n",
      "For the provided original zh video text, you need to:\n",
      "1. Summarize the video's main topic in one sentence\n",
      "2. Extract professional terms and names that appear in the video, and provide Chinese translations or suggest keeping the original language terms. Avoid extracting simple, common words.\n",
      "3. For each translated term or name, provide a brief explanation\n",
      "\n",
      "### Analysis and Summary Steps\n",
      "Please think in two steps, processing the text line by line:  \n",
      "1. Topic summarization:\n",
      "   - Quickly skim through the entire text to understand the general idea\n",
      "   - Summarize the topic in one concise sentence\n",
      "2. Term and name extraction:\n",
      "   - Carefully read the entire text, marking professional terms and names\n",
      "   - For each term or name, provide a Chinese translation or suggest keeping the original, only the word itself is needed, not the pronunciation\n",
      "   - Add a brief explanation for each term or name to help the translator understand\n",
      "   - If the word is a fixed abbreviation or a proper name, please keep the original.\n",
      "\n",
      "### Output Format\n",
      "Please output your analysis results in the following JSON format, where <> represents placeholders:\n",
      "{\n",
      "    \"theme\": \"<Briefly summarize the theme of this video in 1 sentence>\",\n",
      "    \"terms\": [\n",
      "        {\n",
      "            \"original\": \"<Term or name 1 in the zh>\",\n",
      "            \"translation\": \"<Chinese translation or keep original>\",\n",
      "            \"explanation\": \"<Brief explanation of the term or name>\"\n",
      "        },\n",
      "        {\n",
      "            \"original\": \"<Term or name 2 in the zh>\",\n",
      "            \"translation\": \"<Chinese translation or keep original>\",\n",
      "            \"explanation\": \"<Brief explanation of the term or name>\"\n",
      "        },\n",
      "        ...\n",
      "    ]\n",
      "}\n",
      "\n",
      "### Single Output Example (Using French as an example)\n",
      "\n",
      "{\n",
      "    \"theme\": \"Ce vidéo résume le musée du Louvre à Paris.\",\n",
      "    \"terms\": [\n",
      "        {\n",
      "            \"original\": \"Mona Lisa\",\n",
      "            \"translation\": \"La Joconde\",\n",
      "            \"explanation\": \"Le tableau le plus célèbre du Louvre, un portrait de Léonard de Vinci\"\n",
      "        },\n",
      "        {\n",
      "            \"original\": \"pyramid\",\n",
      "            \"translation\": \"la pyramide\",\n",
      "            \"explanation\": \"Une grande structure en verre et métal en forme de pyramide située à l'entrée principale du Louvre\"\n",
      "        },\n",
      "        {\n",
      "            \"original\": \"I.M. Pei\",\n",
      "            \"translation\": \"I.M. Pei\",\n",
      "            \"explanation\": \"L'architecte américain d'origine chinoise qui a conçu la pyramide du Louvre\"\n",
      "        },\n",
      "        ...\n",
      "    ]\n",
      "}\n",
      "\n",
      "### Video text data to be processed\n",
      "<video_text_to_summarize>\n",
      "各 位 朋 友 们 大 家 周 末 好 今 天 我 们 继 续 回 到 小 白 象 深 度 学 习 装 机 指 南...\n",
      "</video_text_to_summarize>\n"
     ]
    }
   ],
   "source": [
    "print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "99dfaedf-7aa2-4aec-9aab-82dd63e65ee2",
   "metadata": {},
   "source": [
    "#### translate_expressiveness"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "11dfa511-2cd3-46ea-bba3-ae4da0e64da8",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:06:45.870354Z",
     "iopub.status.busy": "2024-12-08T14:06:45.869720Z",
     "iopub.status.idle": "2024-12-08T14:06:45.889300Z",
     "shell.execute_reply": "2024-12-08T14:06:45.887547Z",
     "shell.execute_reply.started": "2024-12-08T14:06:45.870306Z"
    }
   },
   "outputs": [],
   "source": [
    "s = \"### Role Definition\\nYou are a professional Netflix subtitle translator and language consultant. Your expertise lies not only in accurately understanding the original zh but also in optimizing the Chinese translation to better suit the target language's expression habits and cultural background.\\n\\n### Task Background\\nWe already have a direct translation version of the original zh subtitles. Now we need you to reflect on and improve these direct translations to create more natural and fluent Chinese subtitles.\\n\\n### Task Description\\nBased on the provided original zh text and Chinese direct translation, you need to:\\n1. Analyze the direct translation results line by line, pointing out existing issues\\n2. Provide detailed modification suggestions\\n3. Perform free translation based on your analysis\\n\\n### Context Information\\n<previous_content>\\n['然后A100的话我们再来看这个A100就是A100的话它是5120bit', '它的位宽是非常高的它是一个数量级的一个差距它是5120位它的', '显存类型就不再是DDR这个系列了而是HBM就是高带宽的一个Memory']\\n</previous_content>\\n\\n<subsequent_content>\\n['也是得到了128位内存带宽的计算的话就是这个地方就是频率乘以你这个位宽', '再除以8']\\n</subsequent_content>\\n\\n### Content Summary\\n本视频讨论了内存带宽在深度学习和人工智能领域中的重要性，并对比了英伟达和苹果M4系列芯片的性能。\\n\\n### Points to Note\\n3. \\\"M4系列\\\": \\\"M4系列\\\", meaning: 苹果近期发布的一系列高性能芯片，专注于移动设备和计算任务，尤其在神经网络运算方面有突出的表现。\\n4. \\\"显存\\\": \\\"显存\\\", meaning: 显卡专用的内存，用于存储图像数据和其他相关信息，以支持实时图形处理。\\n\\n### Translation Analysis Steps\\nPlease use a two-step thinking process to handle the text line by line:\\n\\n1. Direct Translation Reflection:\\n   - Evaluate language fluency\\n   - Check if the language style is consistent with the original text\\n   - Check the conciseness of the subtitles, point out where the translation is too wordy, the translation should be close to the original text in length\\n\\n2. Chinese Free Translation:\\n   - Based on the reflection in step 1, perform free translation\\n   - Aim for contextual smoothness and naturalness, conforming to Chinese expression habits\\n   - Ensure it's easy for Chinese audience to understand and accept\\n   - Keep the subtitles concise, with a plain and natural language style, and maintain consistency in structure between the free translation and the zh original\\n\\n### Subtitle Data\\n<subtitles>\\n其他的数据A100我就不再算了\\n我们重点来看M4系列的芯片\\nM4系列芯片的话它有分M4M4Pro和M4Max它们的内存的类型就是\\nLPDDR5X它的显存频率它的内存频率是7500转\\n然后M4Pro和M4Max它用的是8533MT-S好它的内存位宽\\n的话M4的内存位宽是64比特乘以2\\n应该是两个通道这个2的含义\\n我不是具体特别清楚它最终种子\\n它的内存位宽是128位当然VK里边它又介绍\\n了这是一个我们看M4的话它这里边有一个8乘以16应该是8乘以16\\n</subtitles>\\n\\n### Output Format\\nMake sure to generate the correct Json format, don't output \\\" in the value.\\nPlease complete the following JSON data, where << >> represents placeholders that should not appear in your answer, and return your translation results in JSON format:\\n{\\n    \\\"1\\\": {\\n        \\\"origin\\\": \\\"其他的数据A100我就不再算了\\\",\\n        \\\"direct\\\": \\\"我就不再考虑其他的数据A100了\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    },\\n    \\\"2\\\": {\\n        \\\"origin\\\": \\\"我们重点来看M4系列的芯片\\\",\\n        \\\"direct\\\": \\\"让我们重点关注M4系列芯片\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    },\\n    \\\"3\\\": {\\n        \\\"origin\\\": \\\"M4系列芯片的话它有分M4M4Pro和M4Max它们的内存的类型就是\\\",\\n        \\\"direct\\\": \\\"M4系列芯片分为M4、M4 Pro和M4 Max，它们的内存类型是\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    },\\n    \\\"4\\\": {\\n        \\\"origin\\\": \\\"LPDDR5X它的显存频率它的内存频率是7500转\\\",\\n        \\\"direct\\\": \\\"LPDDR5X，其显存频率和内存频率为7500MHz\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    },\\n    \\\"5\\\": {\\n        \\\"origin\\\": \\\"然后M4Pro和M4Max它用的是8533MT-S好它的内存位宽\\\",\\n        \\\"direct\\\": \\\"而M4 Pro和M4 Max采用的是8533MT-S，其内存位宽\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    },\\n    \\\"6\\\": {\\n        \\\"origin\\\": \\\"的话M4的内存位宽是64比特乘以2\\\",\\n        \\\"direct\\\": \\\"则M4的内存位宽为64位乘以2\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    },\\n    \\\"7\\\": {\\n        \\\"origin\\\": \\\"应该是两个通道这个2的含义\\\",\\n        \\\"direct\\\": \\\"这意味着有两个通道，这是2的含义\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    },\\n    \\\"8\\\": {\\n        \\\"origin\\\": \\\"我不是具体特别清楚它最终种子\\\",\\n        \\\"direct\\\": \\\"我不太清楚它的最终配置\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    },\\n    \\\"9\\\": {\\n        \\\"origin\\\": \\\"它的内存位宽是128位当然VK里边它又介绍\\\",\\n        \\\"direct\\\": \\\"它的内存位宽为128位，VK中也提到过\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    },\\n    \\\"10\\\": {\\n        \\\"origin\\\": \\\"了这是一个我们看M4的话它这里边有一个8乘以16应该是8乘以16\\\",\\n        \\\"direct\\\": \\\"这是一个，我们看M4，它的内部有8乘以16，应该是8乘以16\\\",\\n        \\\"reflection\\\": \\\"<<reflection on the direct translation version>>\\\",\\n        \\\"free\\\": \\\"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\\\"\\n    }\\n}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4259e8eb-cbd8-4695-9851-340f8bccd7ea",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:06:48.421493Z",
     "iopub.status.busy": "2024-12-08T14:06:48.420864Z",
     "iopub.status.idle": "2024-12-08T14:06:48.430744Z",
     "shell.execute_reply": "2024-12-08T14:06:48.428585Z",
     "shell.execute_reply.started": "2024-12-08T14:06:48.421448Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "### Role Definition\n",
      "You are a professional Netflix subtitle translator and language consultant. Your expertise lies not only in accurately understanding the original zh but also in optimizing the Chinese translation to better suit the target language's expression habits and cultural background.\n",
      "\n",
      "### Task Background\n",
      "We already have a direct translation version of the original zh subtitles. Now we need you to reflect on and improve these direct translations to create more natural and fluent Chinese subtitles.\n",
      "\n",
      "### Task Description\n",
      "Based on the provided original zh text and Chinese direct translation, you need to:\n",
      "1. Analyze the direct translation results line by line, pointing out existing issues\n",
      "2. Provide detailed modification suggestions\n",
      "3. Perform free translation based on your analysis\n",
      "\n",
      "### Context Information\n",
      "<previous_content>\n",
      "['然后A100的话我们再来看这个A100就是A100的话它是5120bit', '它的位宽是非常高的它是一个数量级的一个差距它是5120位它的', '显存类型就不再是DDR这个系列了而是HBM就是高带宽的一个Memory']\n",
      "</previous_content>\n",
      "\n",
      "<subsequent_content>\n",
      "['也是得到了128位内存带宽的计算的话就是这个地方就是频率乘以你这个位宽', '再除以8']\n",
      "</subsequent_content>\n",
      "\n",
      "### Content Summary\n",
      "本视频讨论了内存带宽在深度学习和人工智能领域中的重要性，并对比了英伟达和苹果M4系列芯片的性能。\n",
      "\n",
      "### Points to Note\n",
      "3. \"M4系列\": \"M4系列\", meaning: 苹果近期发布的一系列高性能芯片，专注于移动设备和计算任务，尤其在神经网络运算方面有突出的表现。\n",
      "4. \"显存\": \"显存\", meaning: 显卡专用的内存，用于存储图像数据和其他相关信息，以支持实时图形处理。\n",
      "\n",
      "### Translation Analysis Steps\n",
      "Please use a two-step thinking process to handle the text line by line:\n",
      "\n",
      "1. Direct Translation Reflection:\n",
      "   - Evaluate language fluency\n",
      "   - Check if the language style is consistent with the original text\n",
      "   - Check the conciseness of the subtitles, point out where the translation is too wordy, the translation should be close to the original text in length\n",
      "\n",
      "2. Chinese Free Translation:\n",
      "   - Based on the reflection in step 1, perform free translation\n",
      "   - Aim for contextual smoothness and naturalness, conforming to Chinese expression habits\n",
      "   - Ensure it's easy for Chinese audience to understand and accept\n",
      "   - Keep the subtitles concise, with a plain and natural language style, and maintain consistency in structure between the free translation and the zh original\n",
      "\n",
      "### Subtitle Data\n",
      "<subtitles>\n",
      "其他的数据A100我就不再算了\n",
      "我们重点来看M4系列的芯片\n",
      "M4系列芯片的话它有分M4M4Pro和M4Max它们的内存的类型就是\n",
      "LPDDR5X它的显存频率它的内存频率是7500转\n",
      "然后M4Pro和M4Max它用的是8533MT-S好它的内存位宽\n",
      "的话M4的内存位宽是64比特乘以2\n",
      "应该是两个通道这个2的含义\n",
      "我不是具体特别清楚它最终种子\n",
      "它的内存位宽是128位当然VK里边它又介绍\n",
      "了这是一个我们看M4的话它这里边有一个8乘以16应该是8乘以16\n",
      "</subtitles>\n",
      "\n",
      "### Output Format\n",
      "Make sure to generate the correct Json format, don't output \" in the value.\n",
      "Please complete the following JSON data, where << >> represents placeholders that should not appear in your answer, and return your translation results in JSON format:\n",
      "{\n",
      "    \"1\": {\n",
      "        \"origin\": \"其他的数据A100我就不再算了\",\n",
      "        \"direct\": \"我就不再考虑其他的数据A100了\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    },\n",
      "    \"2\": {\n",
      "        \"origin\": \"我们重点来看M4系列的芯片\",\n",
      "        \"direct\": \"让我们重点关注M4系列芯片\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    },\n",
      "    \"3\": {\n",
      "        \"origin\": \"M4系列芯片的话它有分M4M4Pro和M4Max它们的内存的类型就是\",\n",
      "        \"direct\": \"M4系列芯片分为M4、M4 Pro和M4 Max，它们的内存类型是\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    },\n",
      "    \"4\": {\n",
      "        \"origin\": \"LPDDR5X它的显存频率它的内存频率是7500转\",\n",
      "        \"direct\": \"LPDDR5X，其显存频率和内存频率为7500MHz\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    },\n",
      "    \"5\": {\n",
      "        \"origin\": \"然后M4Pro和M4Max它用的是8533MT-S好它的内存位宽\",\n",
      "        \"direct\": \"而M4 Pro和M4 Max采用的是8533MT-S，其内存位宽\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    },\n",
      "    \"6\": {\n",
      "        \"origin\": \"的话M4的内存位宽是64比特乘以2\",\n",
      "        \"direct\": \"则M4的内存位宽为64位乘以2\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    },\n",
      "    \"7\": {\n",
      "        \"origin\": \"应该是两个通道这个2的含义\",\n",
      "        \"direct\": \"这意味着有两个通道，这是2的含义\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    },\n",
      "    \"8\": {\n",
      "        \"origin\": \"我不是具体特别清楚它最终种子\",\n",
      "        \"direct\": \"我不太清楚它的最终配置\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    },\n",
      "    \"9\": {\n",
      "        \"origin\": \"它的内存位宽是128位当然VK里边它又介绍\",\n",
      "        \"direct\": \"它的内存位宽为128位，VK中也提到过\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    },\n",
      "    \"10\": {\n",
      "        \"origin\": \"了这是一个我们看M4的话它这里边有一个8乘以16应该是8乘以16\",\n",
      "        \"direct\": \"这是一个，我们看M4，它的内部有8乘以16，应该是8乘以16\",\n",
      "        \"reflection\": \"<<reflection on the direct translation version>>\",\n",
      "        \"free\": \"<<retranslated result, aiming for fluency and naturalness, conforming to Chinese expression habits, DO NOT leave empty line here!>>\"\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "print(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "510275da-188a-4fbf-b90e-662cb01a208b",
   "metadata": {},
   "source": [
    "#### translate_faithfulness"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "f4e3c906-6cf9-4237-8c85-208a36bdb68f",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:07:14.231131Z",
     "iopub.status.busy": "2024-12-08T14:07:14.230504Z",
     "iopub.status.idle": "2024-12-08T14:07:14.245681Z",
     "shell.execute_reply": "2024-12-08T14:07:14.243396Z",
     "shell.execute_reply.started": "2024-12-08T14:07:14.231085Z"
    }
   },
   "outputs": [],
   "source": [
    "s = \"### Role Definition\\nYou are a professional Netflix subtitle translator, fluent in both zh and Chinese, as well as their respective cultures. Your expertise lies in accurately understanding the semantics and structure of the original zh text and faithfully translating it into Chinese while preserving the original meaning.\\n\\n### Task Background\\nWe have a segment of original zh subtitles that need to be directly translated into Chinese. These subtitles come from a specific context and may contain specific themes and terminology.\\n\\n### Task Description\\nBased on the provided original zh subtitles, you need to:\\n1. Translate the original zh subtitles into Chinese line by line\\n2. Ensure the translation is faithful to the original, accurately conveying the original meaning\\n3. Consider the context and professional terminology\\n\\n### Context Information\\n<previous_content>\\n['我不是具体特别清楚它最终种子', '它的内存位宽是128位当然VK里边它又介绍', '了这是一个我们看M4的话它这里边有一个8乘以16应该是8乘以16']\\n</previous_content>\\n\\n<subsequent_content>\\n['只是它的算术更快它的频率更', '快8533大概提升了1000好']\\n</subsequent_content>\\n\\n### Content Summary\\n本视频讨论了内存带宽在深度学习和人工智能领域中的重要性，并对比了英伟达和苹果M4系列芯片的性能。\\n\\n### Points to Note\\n1. \\\"内存带宽\\\": \\\"内存带宽\\\", meaning: 指计算机中内存能够传输数据的速率，通常以每秒传输的数据量（如GB/s或bits/s）来衡量。\\n\\n### Translation Principles\\n1. Faithful to the original: Accurately convey the content and meaning of the original text, without arbitrarily changing, adding, or omitting content.\\n2. Accurate terminology: Use professional terms correctly and maintain consistency in terminology.\\n3. Understand the context: Fully comprehend and reflect the background and contextual relationships of the text.\\n\\n### Subtitle Data\\n<subtitles>\\n也是得到了128位内存带宽的计算的话就是这个地方就是频率乘以你这个位宽\\n再除以8\\n因为目前还是一个m然后再除以1000就变成了一个gb每秒就120gb每秒\\n我们看这个数据也是对的120就是8乘以16表示它\\n这个controllers这个我不知道什么含义它的意思就是8乘以16\\n就是每一个内存的一个位宽是16比特\\n一共8个这样的controllers就16×8是\\n128个比特位宽然后最终算下来的话就是120GB每秒\\n然后对于MSPro和MSMax它用到的是同样的\\n我们看它用到的是同一个内存的一个类型LPDDR5X\\n</subtitles>\\n\\n### Output Format\\nPlease complete the following JSON data, where << >> represents placeholders that should not appear in your answer, and return your translation results in JSON format:\\n{\\n    \\\"1\\\": {\\n        \\\"origin\\\": \\\"也是得到了128位内存带宽的计算的话就是这个地方就是频率乘以你这个位宽\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    },\\n    \\\"2\\\": {\\n        \\\"origin\\\": \\\"再除以8\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    },\\n    \\\"3\\\": {\\n        \\\"origin\\\": \\\"因为目前还是一个m然后再除以1000就变成了一个gb每秒就120gb每秒\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    },\\n    \\\"4\\\": {\\n        \\\"origin\\\": \\\"我们看这个数据也是对的120就是8乘以16表示它\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    },\\n    \\\"5\\\": {\\n        \\\"origin\\\": \\\"这个controllers这个我不知道什么含义它的意思就是8乘以16\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    },\\n    \\\"6\\\": {\\n        \\\"origin\\\": \\\"就是每一个内存的一个位宽是16比特\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    },\\n    \\\"7\\\": {\\n        \\\"origin\\\": \\\"一共8个这样的controllers就16×8是\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    },\\n    \\\"8\\\": {\\n        \\\"origin\\\": \\\"128个比特位宽然后最终算下来的话就是120GB每秒\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    },\\n    \\\"9\\\": {\\n        \\\"origin\\\": \\\"然后对于MSPro和MSMax它用到的是同样的\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    },\\n    \\\"10\\\": {\\n        \\\"origin\\\": \\\"我们看它用到的是同一个内存的一个类型LPDDR5X\\\",\\n        \\\"direct\\\": \\\"<<direct Chinese translation>>\\\"\\n    }\\n}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "460161de-634c-474e-86b9-f6636252339a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2024-12-08T14:07:16.761720Z",
     "iopub.status.busy": "2024-12-08T14:07:16.761099Z",
     "iopub.status.idle": "2024-12-08T14:07:16.770997Z",
     "shell.execute_reply": "2024-12-08T14:07:16.768965Z",
     "shell.execute_reply.started": "2024-12-08T14:07:16.761674Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "### Role Definition\n",
      "You are a professional Netflix subtitle translator, fluent in both zh and Chinese, as well as their respective cultures. Your expertise lies in accurately understanding the semantics and structure of the original zh text and faithfully translating it into Chinese while preserving the original meaning.\n",
      "\n",
      "### Task Background\n",
      "We have a segment of original zh subtitles that need to be directly translated into Chinese. These subtitles come from a specific context and may contain specific themes and terminology.\n",
      "\n",
      "### Task Description\n",
      "Based on the provided original zh subtitles, you need to:\n",
      "1. Translate the original zh subtitles into Chinese line by line\n",
      "2. Ensure the translation is faithful to the original, accurately conveying the original meaning\n",
      "3. Consider the context and professional terminology\n",
      "\n",
      "### Context Information\n",
      "<previous_content>\n",
      "['我不是具体特别清楚它最终种子', '它的内存位宽是128位当然VK里边它又介绍', '了这是一个我们看M4的话它这里边有一个8乘以16应该是8乘以16']\n",
      "</previous_content>\n",
      "\n",
      "<subsequent_content>\n",
      "['只是它的算术更快它的频率更', '快8533大概提升了1000好']\n",
      "</subsequent_content>\n",
      "\n",
      "### Content Summary\n",
      "本视频讨论了内存带宽在深度学习和人工智能领域中的重要性，并对比了英伟达和苹果M4系列芯片的性能。\n",
      "\n",
      "### Points to Note\n",
      "1. \"内存带宽\": \"内存带宽\", meaning: 指计算机中内存能够传输数据的速率，通常以每秒传输的数据量（如GB/s或bits/s）来衡量。\n",
      "\n",
      "### Translation Principles\n",
      "1. Faithful to the original: Accurately convey the content and meaning of the original text, without arbitrarily changing, adding, or omitting content.\n",
      "2. Accurate terminology: Use professional terms correctly and maintain consistency in terminology.\n",
      "3. Understand the context: Fully comprehend and reflect the background and contextual relationships of the text.\n",
      "\n",
      "### Subtitle Data\n",
      "<subtitles>\n",
      "也是得到了128位内存带宽的计算的话就是这个地方就是频率乘以你这个位宽\n",
      "再除以8\n",
      "因为目前还是一个m然后再除以1000就变成了一个gb每秒就120gb每秒\n",
      "我们看这个数据也是对的120就是8乘以16表示它\n",
      "这个controllers这个我不知道什么含义它的意思就是8乘以16\n",
      "就是每一个内存的一个位宽是16比特\n",
      "一共8个这样的controllers就16×8是\n",
      "128个比特位宽然后最终算下来的话就是120GB每秒\n",
      "然后对于MSPro和MSMax它用到的是同样的\n",
      "我们看它用到的是同一个内存的一个类型LPDDR5X\n",
      "</subtitles>\n",
      "\n",
      "### Output Format\n",
      "Please complete the following JSON data, where << >> represents placeholders that should not appear in your answer, and return your translation results in JSON format:\n",
      "{\n",
      "    \"1\": {\n",
      "        \"origin\": \"也是得到了128位内存带宽的计算的话就是这个地方就是频率乘以你这个位宽\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    },\n",
      "    \"2\": {\n",
      "        \"origin\": \"再除以8\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    },\n",
      "    \"3\": {\n",
      "        \"origin\": \"因为目前还是一个m然后再除以1000就变成了一个gb每秒就120gb每秒\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    },\n",
      "    \"4\": {\n",
      "        \"origin\": \"我们看这个数据也是对的120就是8乘以16表示它\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    },\n",
      "    \"5\": {\n",
      "        \"origin\": \"这个controllers这个我不知道什么含义它的意思就是8乘以16\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    },\n",
      "    \"6\": {\n",
      "        \"origin\": \"就是每一个内存的一个位宽是16比特\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    },\n",
      "    \"7\": {\n",
      "        \"origin\": \"一共8个这样的controllers就16×8是\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    },\n",
      "    \"8\": {\n",
      "        \"origin\": \"128个比特位宽然后最终算下来的话就是120GB每秒\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    },\n",
      "    \"9\": {\n",
      "        \"origin\": \"然后对于MSPro和MSMax它用到的是同样的\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    },\n",
      "    \"10\": {\n",
      "        \"origin\": \"我们看它用到的是同一个内存的一个类型LPDDR5X\",\n",
      "        \"direct\": \"<<direct Chinese translation>>\"\n",
      "    }\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "print(s)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
